8 research outputs found
Severity Classification of Parkinson's Disease from Speech using Single Frequency Filtering-based Features
Developing objective methods for assessing the severity of Parkinson's
disease (PD) is crucial for improving the diagnosis and treatment. This study
proposes two sets of novel features derived from the single frequency filtering
(SFF) method: (1) SFF cepstral coefficients (SFFCC) and (2) MFCCs from the SFF
(MFCC-SFF) for the severity classification of PD. Prior studies have
demonstrated that SFF offers greater spectro-temporal resolution compared to
the short-time Fourier transform. The study uses the PC-GITA database, which
includes speech of PD patients and healthy controls produced in three speaking
tasks (vowels, sentences, text reading). Experiments using the SVM classifier
revealed that the proposed features outperformed the conventional MFCCs in all
three speaking tasks. The proposed SFFCC and MFCC-SFF features gave a relative
improvement of 5.8% and 2.3% for the vowel task, 7.0% & 1.8% for the sentence
task, and 2.4% and 1.1% for the read text task, in comparison to MFCC features.Comment: Accepted by INTERSPEECH 202
Wav2vec-based Detection and Severity Level Classification of Dysarthria from Speech
Automatic detection and severity level classification of dysarthria directly
from acoustic speech signals can be used as a tool in medical diagnosis. In
this work, the pre-trained wav2vec 2.0 model is studied as a feature extractor
to build detection and severity level classification systems for dysarthric
speech. The experiments were carried out with the popularly used UA-speech
database. In the detection experiments, the results revealed that the best
performance was obtained using the embeddings from the first layer of the
wav2vec model that yielded an absolute improvement of 1.23% in accuracy
compared to the best performing baseline feature (spectrogram). In the studied
severity level classification task, the results revealed that the embeddings
from the final layer gave an absolute improvement of 10.62% in accuracy
compared to the best baseline features (mel-frequency cepstral coefficients)
Automatic Classification of Vocal Intensity Category from Speech
Vocal intensity regulation is a fundamental phenomenon in speech communication. In speech science, the term vocal intensity is referred to as the acoustic energy of speech, and it is quantified by sound pressure level (SPL). Unlike, for example, loudspeaker amplifies, which adjust the sound intensity by affecting only the gain, the regulation of intensity in speech is much more complex and challenging because it is based on the physiological speech production mechanism. The speech signal carries acoustical cues about the vocal intensity category/ SPL that the speaker used when the corresponding speech signal was produced. Due to the lack of proper calibration information in existing speech databases, it is not possible to estimate the true vocal intensity category/SPL used in recordings. In addition, there is only one previous study on the automatic classification of vocal intensity category. In this current study, a large speech database representing four vocal intensity categories (soft, normal, loud, and very loud) was recorded from 50 speakers by including calibration information. Two automatic machine learning-based classification systems were developed using Support Vector Machines (SVMs) and Convolutional Neural Networks (CNNs) and using Mel-Frequency Cepstral Coefficients (MFCCs) as features. The results show that the best classification accuracy (of about 65%) was obtained using the SVM classifier
Automatic classification of the severity level of Parkinson’s disease: A comparison of speaking tasks, features, and classifiers
Automatic speech-based severity level classification of Parkinson’s disease (PD) enables objective assessment and earlier diagnosis. While many studies have been conducted on the binary classification task to distinguish speakers in PD from healthy controls (HCs), clearly fewer studies have addressed multi-class PD severity level classification problems. Furthermore, in studying the three main issues of speech-based classification systems—speaking tasks, features, and classifiers—previous investigations on the severity level classification have yielded inconclusive results due to the use of only a few, and sometimes just one, type of speaking task, feature, or classifier in each study. Hence, a systematic comparison is conducted in this study between different speaking tasks, features, and classifiers. Five speaking tasks (vowel task, sentence task, diadochokinetic (DDK) task, read text task, and monologue task), four features (phonation, articulation, prosody, and their fusion), and four classifier architectures (support vector machine (SVM), random forest (RF), multilayer perceptron (MLP), and AdaBoost) were compared. The classification task studied was a 3-class problem to classify PD severity level as healthy vs. mild vs. severe. Two MDS-UPDRS scales (MDS-UPDRS-III and MDS-UPDRS-S) were used for the ground truth severity level labels. The results showed that the use of the monologue task and the articulation and fusion of features improved classification accuracy significantly compared to the use of the other speaking tasks and features. The best classification systems resulted in a rate of accuracy of 58% (using the monologue task with the articulation features) for the MDS-UPDR-III scale and 56% (using the monologue task with fusion of features) for the MDS-UPDRS-S scale.Peer reviewe
Classification of vocal intensity category from speech using the wav2vec2 and whisper embeddings
In speech communication, talkers regulate vocal intensity resulting in speech signals of different intensity categories (e.g., soft, loud). Intensity category carries important information about the speaker's health and emotions. However, many speech databases lack calibration information, and therefore sound pressure level cannot be measured from the recorded data. Machine learning, however, can be used in intensity category classification even though calibration information is not available. This study investigates pre-trained model embeddings (Wav2vec2 and Whisper) in classification of vocal intensity category (soft, normal, loud, and very loud) from speech signals expressed using arbitrary amplitude scales. We use a new database consisting of two speaking tasks (sentence and paragraph). Support vector machine is used as a classifier. Our results show that the pre-trained model embeddings outperformed three baseline features, providing improvements of up to 7%(absolute) in accuracy.Peer reviewe
Towards battery-less RF sensing
Funding Information: The authors appreciate partial funding from the Academy of Finland project ABACUS. Publisher Copyright: © 2021 IEEE.Recent work has demonstrated the use of the radio interface as a sensing modality for gestures, activities and situational perception. The field generally moves towards larger bandwidths, multiple antennas, and higher, mmWave frequency domains, which allow for the recognition of minute movements. We envision another set of applications for RF sensing: battery-less autonomous sensing devices. In this work, we investigate transceiver-less passive RF-sensors which are excited by the fluctuation of the received power over the wireless channel. In particular, we demonstrate the use of battery-less RF-sensing for applications of on-body gesture recognition integrated into smart garment, as well as the integration of such sensing capabilities into smart surfaces.Peer reviewe
Comparing 1-dimensional and 2-dimensional spectral feature representations in voice pathology detection using machine learning and deep learning classifiers
This work was supported by the Academy of Finland (grant number 313390). The computational resources were provided by Aalto ScienceIT.The present study investigates the use of 1-dimensional (1-D) and 2-dimensional (2-D) spectral feature representations in voice pathology detection with several classical machine learning (ML) and recent deep learning (DL) classifiers. Four popularly used spectral feature representations (static mel-frequency cepstral coefficients (MFCCs), dynamic MFCCs, spectrogram and mel-spectrogram) are derived in both the 1-D and 2-D form from voice signals. Three widely used ML classifiers (support vector machine (SVM), random forest (RF) and Adaboost) and three DL classifiers (deep neural network (DNN), long short-term memory (LSTM) network, and convolutional neural network (CNN)) are used with the 1-D feature representations. In addition, CNN classifiers are built using the 2-D feature representations. The popularly used HUPA database is considered in the pathology detection experiments. Experimental results revealed that using the CNN classifier with the 2-D feature representations yielded better accuracy compared tousing the ML and DL classifiers with the 1-D feature representations. The best performance was achieved using the 2-D CNN classifier based on dynamic MFCCs that showed a detection accuracy of 81%.Peer reviewe
Motion pattern recognition in 4D point clouds
We address an actively discussed problem in signal processing, recognizing patterns from spatial data in motion. In particular, we suggest a neural network architecture to recognize motion patterns from 4D point clouds. We demonstrate the feasibility of our approach with point cloud datasets of hand gestures. The architecture, PointGest, directly feeds on unprocessed timelines of point cloud data without any need for voxelization or projection. The model is resilient to noise in the input point cloud through abstraction to lower-density representations, especially for regions of high density. We evaluate the architecture on a benchmark dataset with ten gestures. PointGest achieves an accuracy of 98.8%, outperforming five state-of-the-art point cloud classification models.Peer reviewe